Failure Detectors as First Class Objects
نویسندگان
چکیده
One of the fundamental differences between a centralized system and a distributed one is the notion of partial failures. The ability to efficiently and accurately detect failures is a key element underlying reliable distributed computing. In current distributed systems however, failure detection is either left to the application developer or hidden from the programmer and provided in an ad hoc manner behind the scene. We plead for an intermediate approach where failure detectors are first class objects. We view failure detection as an abstraction, the complexity of which is encapsulated behind well defined interfaces. The various roles of a failure detection service are all represented as first class objects. Following our approach, one can reuse existing failure detection protocols as they are or, through composition or refinement, define new protocols that match the application requirements. We describe an interesting result of a composition that mixes push and pull failure monitoring and we show how scalability issues may be addressed by using a hierarchical failure detection configuration. We also discuss the implementation of our failure service both in CORBA and in Java.
منابع مشابه
Computer Science and Artificial Intelligence Laboratory Impossibility of Boosting Distributed Service Resilience
We prove two theorems saying that no distributed system in which processes coordinate using reliable registers and f -resilient services can solve the consensus problem in the presence of f + 1 undetectable process stopping failures. (A service is f -resilient if it is guaranteed to operate as long as no more than f of the processes connected to it fail.) Our first theorem assumes that the give...
متن کاملContours Extraction Using Line Detection and Zernike Moment
Most of the contour detection methods suffers from some drawbacks such as noise, occlusion of objects, shifting, scaling and rotation of objects in image which they suppress the recognition accuracy. To solve the problem, this paper utilizes Zernike Moment (ZM) and Pseudo Zernike Moment (PZM) to extract object contour features in all situations such as rotation, scaling and shifting of object i...
متن کامل(anti−Ω × Σz)-based k-set Agreement Algorithms
This paper considers the k-set agreement problem in a crash-prone asynchronous message passing system enriched with failure detectors. Two classes of failure detectors have been previously identified as necessary to solve asynchronous k-set agreement: the class anti-leader anti−Ω and the weak-quorum class Σk. The paper investigates the families of failure detector (anti−Ωx)1≤x≤n and (Σz)1≤z≤n. ...
متن کاملImplementing the Weakest Failure Detector for Solving Consensus
The concept of unreliable failure detector was introduced by Chandra and Toueg as a mechanism that provides information about process failures. This mechanism has been used to solve several agreement problems, like Consensus. In this paper, algorithms that implement failure detectors in partially synchronous systems are presented. First two simple algorithms of the weakest class to solve Consen...
متن کاملOn the Impossibility of Boosting Distributed Service Resilience∗
We show that no deterministic algorithm can solve consensus in the presence of t+1 process crash failures, in a system of n processes that communicate in a reliable way and synchronize their activities using any number of t-resilient services. These base services can range from any type of atomic objects shared by the processes (including consensus objects), to any class of non-atomic objects l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999